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ABSTRACT 


We introduce GOTCHAs (Generating panOptic Turing Tests 
to Tell Computers and Humans Apart) as a way of pre- 
venting automated offline dictionary attacks against user 
selected passwords. A GOTCHA is a randomized puzzle 
generation protocol, which involves interaction between a 
computer and a human. Informally, a GOTCHA should 
satisfy two key properties: (1) The puzzles are easy for the 
human to solve. (2) The puzzles are hard for a computer 
to solve even if it has the random bits used by the com- 
puter to generate the final puzzle — unlike a CAPTCHA 
[43]. Our main theorem demonstrates that GOTCHAs can 
be used to mitigate the threat of offline dictionary attacks 
against passwords by ensuring that a password cracker must 
receive constant feedback from a human being while mount- 
ing an attack. Finally, we provide a candidate construction 
of GOTCHAs based on Inkblot images. Our construction re- 
lies on the usability assumption that users can recognize the 
phrases that they originally used to describe each Inkblot 
image — a much weaker usability assumption than previous 
password systems based on Inkblots which required users 
to recall their phrase exactly. We conduct a user study to 
evaluate the usability of our GOTCHA construction. We 
also generate a GOTCHA challenge where we encourage ar- 
tificial intelligence and security researchers to try to crack 
several passwords protected with our scheme. 
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1. INTRODUCTION 


Any adversary who has obtained the cryptographic hash 
of a user’s password can mount an automated brute-force at- 
tack to crack the password by comparing the cryptographic 
hash of the user’s password with the cryptographic hashes 
of likely password guesses. This attack is called an offline 
dictionary attack, and there are many password crackers 
that an adversary could use [7]. Offline dictionary at- 
tacks against passwords are — unfortunately — powerful 
and commonplace [25]. Adversaries have been able to com- 
promise servers at large companies (e.g., Zappos, LinkedIn, 
Sony, Gawker |B, 2, 9, Æ, M, B) resulting in the release of mil- 
lions of cryptographic password hashes". It has been repeat- 
edly demonstrated that users tend to select easily guessable 
passwords B7, IX, [Hl], and password crackers are able to 
quickly break many of these passwords|[B9]. Offline attacks 
are becoming increasingly dangerous as computing hardware 
improves — a modern GPU can evaluate a cryptographic 
hash function like SHA2 about 250 million times per sec- 
ond [49] — and as more and more training data — leaked 
passwords from prior breaches — becomes available [25]. 
Symantec reported that compromised passwords have sig- 
nificant economic value to an adversary (e.g., compromised 
passwords are sold on black market for between $4 and $30 
) A. 

HOSPs (Human-Only Solvable Puzzles) were suggested by 
Canetti, Halevi and Steiner as a way of defending against of- 
fline dictionary attacks [M]. The basic idea is to change the 
authentication protocol so that human interaction is required 
to verify a password guess. The authentication protocol be- 
gins with the user entering his password. In response the 
server randomly generates a challenge — using the pass- 
word as a source of randomness — for the user to solve. 
Finally, the server appends the user’s response to the user’s 
password, and verifies that the hash matches the record on 
the server. To crack the user’s password offline the adver- 
sary must simultaneously guess the user’s password and the 
answer to the corresponding puzzle. The challenge should 
be easy for a human to solve consistently so that a legiti- 
mate user can authenticate. To mitigate the threat of an 
offline dictionary attack the HOSP should be difficult for a 


‘In a few of these cases |B, M] the passwords were stored in 
the clear. 


computer to solve — even if it has all of the random bits 
used to generate the challenge. 

The basic HOSP construction proposed by Canetti et al. 
[4] was to to fill a hard drive with regular CAPTCHAs (e.g., 
distorted text) by storing the puzzles without the answers. 
This solution only provides limited protection against an ad- 
versary because the number of unique puzzles that can be 
generated is bounded by the size of the hard drive (e.g., the 
adversary could pay people to solve all of the puzzles on the 
hard drive). See appendix B] for more discussion. Finding 
a usable HOSP construction which does not rely on a very 
large dataset of pregenerated CAPTCHAs is an open prob- 
lem. Several candidate HOSPs were experimentally tested 
[145] (they are called POSHs in the second paper), but the 
usability results were underwhelming. 


Contributions. 

We introduce a simple modification of HOSPs that we call 
GOTCHAs (Generating panOptic Turing Tests to Tell Com- 
puters and Humans Apart). We use the adjective Panoptic 
to refer to a world without privacy — there are no hidden 
random inputs to the puzzle generation protocol. The basic 
goal of GOTCHAs is similar to the goal of HOSPs — de- 
fending against offline dictionary attacks. GOTCHAs differ 
from HOSPs in two ways (1) Unlike a HOSP a GOTCHA 
may require human interaction during the generation of the 
challenge. (2) We relax the requirement that a user needs 
to be able to answer all challenges easily and consistently. 
If the user can remember his password during the authen- 
tication protocol then he will only ever see one challenge. 
We only require that the user must be able to answer this 
challenge consistently. If the user enters the wrong password 
during authentication then he may see new challenges. We 
do not require that the user must be able to solve these chal- 
lenges consistently because authentication will fail in either 
case. We do require that it is difficult for a computer to dis- 
tinguish between the “correct” challenge and an “incorrect” 
challenge. Our main theorem demonstrates that GOTCHAs 
like HOSPs can be used to defend against offline dictionary 
attacks. The goal of these relaxations is to enable the design 
of usable GOTCHAs. 

We introduce a candidate GOTCHA construction based 
on Inkblot images. While the images are generated ran- 
domly by a computer, the human mind can easily imagine 
semantically meaningful objects in each image. To generate 
a challenge the computer first generates ten inkblot images 
(e.g., figure M). The user then provides labels for each im- 
age (e.g., evil clown, big frog). During authentication the 
challenge is to match each inkblot image with the corre- 
sponding label. We empirically evaluate the usability of our 
inkblot matching GOTCHA construction by conducting a 
user study on Amazon’s Mechanical Turk. Finally, we chal- 
lenge the AI community to break our GOTCHA construc- 
tion. 


Organization. 
The rest of the paper is organized as follows: We next dis- 


cuss related work in section LI. We formally define GOTCHAs 


in section Ø and formalize the properties that a GOTCHA 
should satisfy. We present our candidate GOTCHA con- 
struction in section B, and in section BI] we demonstrate 
how our GOTCHA could be integrated into an authentica- 
tion protocol. We present the results from our user study 
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Figure 1: Randomly Generated Inkblot Image—An evil 
clown? 


in section BA, and in section we challenge the AI and 
security communities to break our GOTCHA construction. 
In section A we prove that GOTCHAs like HOSPs can also 
be used to design a password storage system which mitigates 
the threat of offline attacks. We conclude by discussing fu- 
ture directions and challenges in section B. 


1.1 Related Work 


Inkblots [42] have been proposed as an alternative way to 
generate and remember passwords. Stubblefield and Simon 
proposed showing the user ten randomly generated inkblot 
images, and having the user make up a word or a phrase to 
describe each image. These phrases were then used to build 
a 20 character password (e.g., users were instructed to take 
the first and last letter of each phrase). Usability results 
were moderately good, but users sometimes had trouble re- 
membering their association. Because the Inkblots are pub- 
licly available there is also a security concern that Inkblot 
passwords could be guessable if different users consistently 
picked similar phrases to describe the same Inkblot. 

We stress that our use of Inkblot images is different in two 
ways: (1) Usability: We do not require users to recall the 
word or phrase associated with each Inkblot. Instead we re- 
quire user’s to recognize the word or phrase associated with 
each Inkblot so that they can match each phrase with the 
appropriate Inkblot image. Recognition is widely accepted 
to be easier than the task of recall fø, #5]. (2) Security: We 
do not need to assume that it would be difficult for other 
humans to match the phrases with each Inkblot. We only 
assume that it is difficult for a computer to perform this 
matching automatically. 

CAPTCHAs — formally introduced by Von Ahn et al. 
[43] — have gained widespread adoption on the internet to 
prevent bots from automatically registering for accounts. A 
CAPTCHA is a program that generates a puzzle — which 
should be easy for a human to solve and difficult for a com- 
puter to solve — as well as a solution. Many popular forms 
of CAPTCHAs (e.g., reCAPTCHA [4]) generate garbled 
text, which is easy “ for a human to read, but difficult for 
a computer to decipher. Other versions of CAPTCHAs rely 
on the natural human capacity for audio [B7] or image recog- 


nition [19]. 


? Admitedly some people would dispute the use of the label 
‘easy.’ 


CAPTCHAs have been used to defend against online pass- 
word guessing attacks — users are sometimes required to 
solve a CAPTCHA before signing into their account. An 
alternative approach is to lock out a user after several incor- 
rect guesses, but this can lead to denial of service attacks 
[16]. However, if the adversary has access to the crypto- 
graphic hash of the user’s password, then he can circum- 
vent all of these requirements and execute an automatic 
dictionary attack to crack the password offline. By contrast 
HOSPs — proposed by Canetti et al. [Æ] — were proposed to 
defend against offline attacks. HOSPs are in some ways sim- 
ilar to CAPTCHAs (Completely Automated Turing Tests to 
Tell Computers and Humans Apart) [43]. CAPTCHAs are 
widely used on the internet to fight spam by preventing bots 
from automatically registering for accounts. In this setting 
a CAPTCHA is sent to the user as a challenge, while the 
secret solution is used to grade the user’s answer. The im- 
plicit assumption is that the answer and the random bits 
used to generate the puzzle remain hidden — otherwise a 
spam bot could simply regenerate the puzzle and the an- 
swer. While this assumption may be reasonable in the spam 
bot setting, it does not hold in our offline password attack 
setting in which the server has already been breached. A 
HOSP is different from a CAPTCHA in several key ways: 
(1) The challenge must remain difficult for a computer to 
solve even if the random bits used to generate the puzzle 
are made public. (2) There is no single correct answer to a 
HOSP. It is okay if different people give different responses 
to a challenge as long as people can respond to the challenges 
easily, and each user can consistently answer the challenges. 

The only HOSP construction proposed in [14] involved 
stuffing a hard drive with unsolved CAPTCHAs. The prob- 
lem of finding a HOSP construction that does not rely on a 
dataset of unsolved CAPTCHAs was left as an open prob- 
lem [14]. Several other candidate HOSP constructions have 
been experimentally evaluated in subsequent work [L5] (they 
are called POSHs in the second paper), but the usability re- 
sults for every scheme that did not rely on a large dataset 
on unsolved CAPTCHAs were underwhelming. 

GOTCHAs are very similar to HOSPs. The basic appli- 
cation — defending against offline dictionary attacks — is 
the same as are the key tools: exploiting the power of inter- 
action during authentication, exploiting hard artificial intel- 
ligence problems. While the authentication with HOSPs is 
interactive, the initial generation of the puzzle is not. By 
contrast, our GOTCHA construction requires human inter- 
action during the initial generation of the puzzle. This sim- 
ple relaxation allows for the construction of new solutions. 
In the HOSP paper humans are simply modeled as a puz- 
zle solving oracle, and the adversary is assumed to have a 
limited number of queries to a human oracle. We introduce 
a more intricate model of the human agent with the goal of 
designing more usable constructions. 


Password Storage. 

Password storage is an incredibly challenging problem. 
Adversaries have been able to compromise servers at many 
large companies (e.g., Zappos, LinkedIn, Sony, Gawker [H, Ø, 
9, Æ, M, B). For example, hackers were able to obtain 32 mil- 
lion plaintext passwords from RockYou using a simple SQL 
injection attack [Il]. While it is considered an extremely poor 
security practice to store passwords in the clear [AI], the 
practice is still fairly common [1, B, M. Many other com- 


panies W, LA] have used cryptographic hashes to store their 
passwords, but failed to adopt the practice of salting (e.g., 
instead of storing the cryptographic hash of the password 
h(pw) the server stores (h (pw,r),7r) for a random string r 
[6]) to defend against rainbow table attacks. Rainbow ta- 
bles, which consist of precomputed hashes, are often used by 
an adversary to significantly speed up a password cracking 
attack because the same table can be reused to attack each 
user when the passwords are unsalted [B3]. 

Cryptographic hash functions like SHA1, SHA2 and MD5 
— designed for fast hardware computation — are popular 
choices for password hashing. Unfortunately, this allows an 
adversary to try up to 250 million guesses per second on a 
modern GPU [49]. The BCRYPT [B5] hash function was de- 
signed specifically with passwords in mind — BCRYPT was 
intentionally designed to be slow to compute (e.g., to limit 
the power of an adversary’s offline attack). The BCRYPT 
hash function takes a parameter which allows the program- 
mer to specify how costly the hash computation should be. 
The downside to this approach is that it also increases costs 
for the company that stores the passwords (e.g., if we want 
it to cost the adversary $1,000 for every million guesses then 
it will also cost the company at least $1,000 for every million 
login attempts). 

Users are often advised (or required) to follow strict guide- 
lines when selecting their password (e.g., use a mix of up- 
per /lower case letters, include numbers and change the pass- 
word frequently) B8]. However, empirical studies show that 
user’s are are often frustrated by restricting policies and 
commonly forget their passwords [28, 29, 20] B. Further- 
more, the cost of these restrictive policies can be quite high. 
For example, a Gartner case study [7] estimated that it cost 
over $17 per password-reset call. Florencio and Herley [21] 
studied the economic factors that institutions consider be- 
fore adopting password policies and found that they often 
value usability over security. 


2. DEFINITIONS 


In this section we seek to establish a theoretical basis for 
GOTCHAs. Several of the ideas behind our definitions are 
borrowed from theoretical definitions of CAPTCHAs [43] 
and HOSPs [A]. Like CAPTCHAs and HOSPs, GOTCHAs 
are based on the assumption that some AI problem is hard 
for a computer to solve, but easy for a person to solve. Ul- 
timately, these assumptions are almost certainly false (e.g., 
because the human brain can solve a GOTCHA it is rea- 
sonable to believe that there exists a computer program to 
solve the problems). However, it may still be reasonable 
to assume that these problems cannot be solved by applying 
known ideas. By providing a formal definition of GOTCHAs 
we can determine whether or not a new idea can be used to 
break a candidate GOTCHA construction. 

We use c € C to denote the space of challenges that 
might be generated. We use H to denote the set of hu- 
man users and H (c,o+) to denote the response that a hu- 
man H € H gives to the challenge c € C at time t. Here, 
ot denotes the state of the human’s brain at time t. oy is 
supposed to encode our user’s existing knowledge (e.g., vo- 
cabulary, experiences) as well as the user’s mental state at 
time t (e.g., what is the user thinking about at time t). Be- 


3In fact the resulting passwords are sometimes more vulner- 
able to an offline attack! [28, 29] 


cause o; changes over time (e.g., new experiences) we use 
H (c) = {H (c, or) |t € N} to denote the set of all answers a 
human might give to a challenge c. We use A to denote the 
range of possible responses (answers) that a human might 
give to the challenges. 


DEFINITION 1. Given a metric d: Ax A— R, we say 
that a human H can consistently solve a challenge c € C 
with accuracy a if Vt € N 


d(H (c,00),H (c,04)) <a, 


where oo denotes the state of the human’s brain when he 
initially answers the challenge. If |H (c)| = 1 then we simply 
say that the human can consistently solve the challenge. 


Notation: When we have a group of challenges (c1,..., cz) 
we will sometimes write H ((c1,...,ck),o+) = 
(H (c1,0t),..., H (Ck, o+)) for notational convenience. We 


use y ~ D to denote a random sample from the distribution 


D, and we use r È {0,1}” to denote a element drawn from 
the set {0,1}” uniformly at random. 

One of the requirements of a HOSP puzzle system [14] is 
that the human H must be able to consistently answer any 
challenge that is generated (e.g., Ve € C, H can consistently 
solve c). These requirements seem to rule out promising 
ideas for HOSP constructions like Inkblots[I5]. In this con- 
struction the challenge is a randomly generated inkblot im- 
age I, and the response H (I, 00) is word or phrase describing 
what the user initially sees in the inkblot image (e.g., evil 
clown, soldier, big lady with a ponytail). User studies have 
shown that H (I, øo) does not always match H (I, o+) — the 
phrase describing what the user sees at time t [B]. In a 
few cases the errors may be correctable (e.g., capitalization, 
plural/singular form of a word), but oftentimes the phrase 
was completely different — especially if a long time passed 
in between trials”. By contrast, our GOTCHA construction 
does not require the user to remember the phrases associ- 
ated with each Inkblot. Instead we rely on a much weaker 
assumption — the user can consistently recognize his solu- 
tions. We say that a human can recognize his solutions to 
a set of challenges if he can consistently solve a matching 
challenge (definition Ø) in which he is asked to match each 
of his solutions with the corresponding challenge. 


DEFINITION 2. Given an integer k, and a permutation r : 
[k] > [k], a matching challenge ĉr = (€, a) € C of size k is 
given by a k-tuple of challenges € = (cy(1),-++5€n(k)) E ck 
and solutions @ = H ((ci,...,ck),00). The response to a 
matching challenge is a permutation n’ = H (ĉr, 0+). 


For permutations 7 : [k] — [k] we use the distance metric 
dk (mi, 72) = {i | mı (i) məli) ^1<i< k}| 3 


dk (71,72) simply counts the number of entries where the 
permutations don’t match. We say that a human can consis- 
tently recognize his solution to a matching challenge ¢, with 


accuracy a if Vt.dy (H (ĉr, 0+), m) < a. We use {r |dx (m, 7’) < 


a} to denote the set of permutations 7’ that are a-close to 
T. 


“We would add the requirement that the human must be 
able to consistently answer the challenges without spending 
time memorizing and rehearsing his response to the chal- 
lenge. Otherwise we could just as easily force the user to 
remember a random string to append on to his password. 


The puzzle generation process for a GOTCHA involves 
interaction between the human and a computer: (1) The 
computer generates a set of k challenges. (2) The human 
solves these challenges. (3) The computer uses the solutions 
to produce a final challenge P. Formally, 


DEFINITION 3. A puzzle-system is a pair (Gi, G2), where 
Gi is a randomized challenge generator that takes as in- 
put 1} (with k security parameter) and a pair of random bit 
strings r1,T2 E€ {0,1}* and outputs k challenges (c1,...,Ck) < 
Gi (i ri, r2). G2 is a randomized challenge generator that 
takes as input 1* (security parameter), a random bit string 
rı € {0,1}*, and proposed answers @ = (a1,...,an) to the 
challenges Gi (1%, ri, r2) and outputs a challenge 
ĉ + Go (1%, r1, a). We say that the puzzle-system is (a, 3)- 
usable if 

Pr [Accurate (H,é,a)] > 8, 
HËH 
whenever @ = H (Gi (i, T1, r2) ,00), where Accurate (H, é, a) 
denotes the event that the human H can consistently solve ĉ 
with accuracy a. 


In our authentication setting the random string rı is ex- 
tracted from the user’s password using a strong pseudoran- 
dom function Extract. To provide a concrete example of a 
puzzle-system, Gi could be a program that generates a set of 


inkblot challenges (,..., J) using random bits ri, selects 
a random permutation m : [k] > [k] using random bits r2, 
and returns (I7(1),---,Iz(%)). The human’s response to an 


Inkblot — H (I;,00) — is whatever he/she imagines when he 
sees the inkblot J; for the first time (e.g., some people might 
imagine an evil clown when they look at figure M). Finally, 
G2 might generate Inkblots ¢ = (h,...,J,) using random 
bits rı, and return the matching challenge ĉr = (¢,d). In 
this case the matching challenge is for the user to match his 
labels with the appropriate Inkblot images to recover the 
permutation m. Observe that the final challenge — ĉr — 
can only be generated after a round of interaction between 
the computer and a human. By contrast, the challenges in 
a HOSP must be generated automatically by a computer. 
Also notice that if Go is executed with a different random 
bit string r| then we do not require the resulting challenge 
to be consistently recognizable (e.g., if the user enters in the 
wrong password then authentication will fail regardless of 
how he solves the resulting challenge). For example, if the 
user enters the wrong password the user might be asked to 
match his labels (€,(1), -..,€n(x)) = H (ra) ~ -, In(k))» 00) 
with Inkblots (I;,..., Iņ) that he has never seen. 

An adversary could attack a puzzle system by either (1) 
attempting to distinguish between the correct puzzle, and 
puzzles that might be meaningless to the human, or (2) by 
solving the matching challenge directly. 

We say that an algorithm A can distinguish distributions 
Dı and D2 with advantage e if 


Pr [A (x)= 1]— Pr [A(y)=1]| >e. 
Ps A=- Pr (AW) =i] >¢ 

Our formal definition of a GOTCHA is found in defini- 
tion Æ. Intuitively, definition Ml says that (1) The underlying 


5We note that a HOSP puzzle system (G) [14] can be mod- 
eled as a GOTCHA puzzle system (G1,G2) where Gi does 
nothing and G2 simply runs G to generate the final challenge 
ê directly. 


puzzle-system should be usable — so that legitimate users 
can authenticate. (2) It should be difficult for the adversary 
to distinguish between the correct matching challenge (e.g., 
the one that the user will see when he types in the cor- 
rect password), and an incorrect matching challenge (e.g., 
if the user enters the wrong password he will be asked to 
match his labels with different Inkblot images), and (3) It 
should be difficult for the adversary to distinguish between 
the user’s matching, and a random matching drawn from a 
distribution R with sufficiently high minimum entropy. 


DEFINITION 4. A puzzle-system (G1, G2) is an (a, B, €, ô, u)- 


GOTCHA if (1) (Gi,G2) is (a, B)-usable (2) Given a hu- 
man H € H no probabilistic polynomial time algorithm can 
distinguish between distributions 


D, = H(G1(1",r1;r2) o0), 
i Go(1*,r1,H(G1(1*,r1,r2),00)) 


T1, T2 È {0, jt 


and 


= H(Gi(1*,r1,r2),00), 
a late ead 


T1,72,73 È {0, i} 


with advantage greater than e, and (8) Given a human H € 
H, there is a distribution R(c) with u(m) bits of minimum 
entropy such that no probabilistic polynomial time algorithm 
can distinguish between distributions 


(oaa) 
D3 — Go(1*,r1,H(Gi(1*,r1,r2),00)), 
H( 


Ga(1F,rı A(Gy (1*.r1,72),¢0)).¢0) 


rı, r2 È {0, | 


and 


H(G1(1*,r1,r2),00) 
Da= Go(1*,r1,H(Gi(1*,r1,r2),¢0)), 
) 


R(G2(1™,r1,(a1,---,4m)),o0 


T1,72 È {0, | 


with advantage greater then 6. 


2.1 Password Storage and Offline Attacks 


To protect users in the event of a server breach organiza- 
tions are advised to store salted password hashes — using 
a cryptographic hash function (h : {0,1}* — {0,1}”) and 
a random bit string (s € {0,1}*) [BS]. For example, if a 
user (u) chose the password (pw) the server would store 
the tuple (u,s,h(s,pw)). Any adversary who has obtained 
(u, s,h(s,pw)) (e.g., through a server breach) may mount a 
— fully automated — offline dictionary attack using pow- 
erful password crackers like John the Ripper l]. To verify 
a guess pw’ the adversary simply computes h (s, pw’) and 
checks to see if this hash matches h (s, pw). 

We assume that an adversary Adv who breaches the server 


can obtain the code for h, as well as the code for any GOTCHAs 


used in the authentication protocol. Given the code for h 
and the salt value s the adversary can construct a function 


1 if h(s,pw) = h (s, pw’) 


VerifyHash (pw’) = i otherwise. ` 


We also allow the adversary to have black box access to 
a GOTCHA solver (e.g., a human). We use cy to denote 
the cost of querying a human and c, to denote the cost of 
querying the function VerifyHash®, and we use ny (resp. 


®The value of cp may vary widely depending on the par- 
ticular cryptographic hash function — it is inexpensive to 
evaluate SHA1, but BCRYPT [B5] may be very expensive 
to evaluate. 


np) to denote the number of queries to the human (resp. 
VerifyHash). Queries to the human GOTCHA solver are 
much more expensive than queries to the cryptographic hash 
function (cH >> cn) [BI]. For technical reasons we limit our 
analysis to conservative adversaries. 


DEFINITION 5. We say that an adversary Adv is conser- 
vative if (1) Adv uses the cryptographic hash function h 
in a black box manner (e.g., the hash function h and the 
stored hash value are only used to construct a subroutine 
VerifyHash which is then used as a black box by Adv ), 
(2) The pseudorandom function Extract is used as a black 
box, and (8) The adversary only queries a human about chal- 
lenges generated using a password guess. 


It is reasonable to believe that our adversary is conservative. 
All existing password crackers (e.g., [Z]) use the hash func- 
tion as a black box, and it is difficult to imagine that the 
adversary would benefit by querying a human solver about 
Inkblots that are unrelated to the password. 

We use D C {0,1}* to denote a dictionary of likely guesses 
that the adversary would like to try, 


Cost (Adv, D) = (nach + nyHcH) 


to denote the cost of the queries that the adversary makes to 
check each guess in D, and Succeed (Adv, D, pw) to denote 
the event that the adversary makes a query to VerifyHash 
that returns 1 (e.g., the adversary successfully finds the 
user’s password pw). The adversary might use a computer 
program to try to solve some of the GOTCHAs — to save 
cost by not querying a human. However, in this case the ad- 
versary might fail to crack the password because the GOTCHA 
solver found the wrong solution to one of the challenges. 


DEFINITION 6. An adversary Adv is (C, y, D)-successful 
if Cost (Adv, D) < C, and 


Pr [Succeed (Adv, D, pw) > y. 


pw&D 


Our attack model is slightly different from the attack 
model in [Æ]. They assume that the adversary may ask 
a limited number of queries to a human challenge solution 
oracle. Instead we adopt an economic model similar to [HU], 
and assume that the adversary is instead limited by a budget 
C, which may be used to either evaluate the cryptographic 
hash function h or query a human H. 


3. INKBLOT CONSTRUCTION 


Our candidate GOTCHA construction is based on Inkblots 
images. We use algorithm [I] to generate inkblot images. Al- 
gorithm [] takes as input random bits rı and a security pa- 
rameter k — which specifies the number of Inkblots to out- 
put. Algorithm 1] makes use of the randomized subroutine 
DrawRandomEllipsePairs (J, t, width, height) which draws 
t pairs of ellipses on the image J with the specified width 
and height. The first ellipse in each pair is drawn at a ran- 
dom (x,y) coordinate on the left half of the image with a 
randomly selected color and angle a of rotation, and the sec- 
ond ellipse is mirrored on the right half of the image. Figure 
Mis an example of an Inkblot image generated by algorithm 
i. 

Our candidate GOTCHA is given by the pair (Gi, G2) 
— algorithms Ø and B. Gi runs algorithm [I] to generate k 


Algorithm 1 GenerateInkblotImages 


Input: Security Parameter 1", Random bit string rı € 
{0,1}*. 
for j = 1,...,k do 
I; + new Blank Image > The 
following operations only use the random bit string rı as 
a source of randomness 
DrawRandomEllipsePairs (J,;, 150, 60, 60) 
DrawRandomEllipsePairs (J, , 70, 20, 20) 


DrawRandomEllipsePairs (J; , 150, 60, 20) 
return ([1,..., Ik) > Inkblot Images 


Inkblot images, and then returns these images in permuted 
order — using a function 
GenerateRandomPermutation (k,r), which generates a 
random permutation 7 : [k] > [k] using random bits r. G2 
also runs algorithm [I to generate k Inkblot images, and then 
outputs a matching challenge. 


Algorithm 2 Gi 


Input: Security Parameter 1”, Random bit strings 
r1,72 € {0,1}*. 

(L,..., Ik) < GenerateInkblotImages (k, 71) 

t < GenerateRandomPermutation (k, r2) 

return (Ir(1),---, In(k)) 


After the Inkblots (Iz(1),. - -, In(k)} have been generated, 
the human user is queried to provide labels ¢;(1),..., r(x) 
where 


Ura)» TEF ,ln(k)) =H (ra) ses 


In our authentication setting the server would store the la- 
bels €,(1),-+-,€n(k) in permuted order. The final challenge 
— generated by algorithm Bl — is to match the Inkblot im- 
ages I1,...,J, with the user generated labels ¢1,...,2% to 
recover the permutation 7. 


Lik) a0) . 


Algorithm 3 GenerateMatchingChallenge G2 


Input: Security Parameter 1”, Random bits rı € {0, 1}* 
and labels @ = (lr(1), - - - , n(k))- 

(L,..., Ik) < GenerateInkblotImages (1*; rı) 
return ¢, = (€, d) > Matching Challenge 


Observation: Notice that if the random bits provided 
as input to GenerateInkblotImages and 
GenerateMatchingChallenge match that the user will 
see the same Inkblot images in the final matching challenge. 
However, if the random bits do not match (e.g., because 
the user typed the wrong password in our authentication 
protocol) then the user will see different Inkblot images. The 
labels £1,..., 4 will be the same in both cases. 


3.1 GOTCHA Authentication 


To illustrate how our GOTCHAs can be used to defend 
against offline attacks we present the following authentica- 
tion protocols: Create Account (protocol BT) and Au- 
thenticate (protocol BQ). Communication in both proto- 
cols should take place over a secure channel. Both protocols 
involve several rounds of interaction between the user and 


the server. To create a new account the user sends his user- 
name/password to the server, the server responds by gen- 
erating k Inkblot images ),...,J,, and the user provides 
a response ((1,...,£%) = H ((h,...,Jx),00) based on his 
mental state at the time — the server stores these labels in 
permuted order €,(1),..-, x(x) To authenticate later the 
user will have to match these labels with the corresponding 
inkblot images to recover the permutation 7. 

In section Ml we argue that the adversary who wishes to 
mount a cost effective offline attack needs to obtain constant 
feedback from a human. Following [Æ] we assume that the 
function Extract : {0,1}* —> {0,1}” is a strong random- 
ness extractor, which can be used to extract random strings 
from the user’s password. Recall that h : {0,1}* — {0,1}* 
denotes a cryptographic hash function. 


Protocol 3.1: Create Account 


Security Parameters: k,n. 

(User): Select username (u) and password (pw) and send 

(u, pw) to the server. 

(Server): Sends Inkblots (I, ..., Ip} to the user where: 
re {0,1}", rı + Extract (pw,r’), r2 È {0,1}” and 
(L,..., Ik) < GenerateInkblotImages (Gus rı) 


(User): Sends responses (¢1,...,€%) back to the server 
where: 

(h, pea , lk) s A((h, bee Ik), 00). 
(Server): Store the tuple t where t is computed as 
follows: 


Salt: s È {0, 1}” 

ma + GenerateRandomPermutation (k, r2). 
hpw + h (u, s, pw, n(1), ..., 7(k)) 

te (u, r', 8, pw, L(A) sas Ln(k)) 


Protocol 3.2: Authenticate 


Security Parameters: k,n. 
Usability Parameter: a 
(User): Send username (u) and password (pw’) — pw’ 
may or may not be correct. 
(Server): Sends challenge ĉ to the user where ĉis com- 
puted as follows: 
Find t = (u, r’, 8, hpw, lm(1)s Sch 
ri + Extract (pw’,r’) 
(Ii, ..., Ip) < GenerateInkblotImages (r+, k) 
ĉr + ((h, ey Ir), Ura) Fis ,Lr())) 
(User): Solves ĉr and sends the answer 1’ = H (ê, o+). 
(Server): 
for all mo s.t dp (T0, 7’) < a do 
hpw,0 Hh (u, Ss, pw’, mo(1), ery To(k)) 
if hpw,o = hpu then 
Authenticate 
Deny 


’ lrk) ) 


Our protocol could be updated to allow the user to re- 
ject challenges he found confusing during account creation 


"For a general GOTCHA, protocol B] would need to have an 
extra round of communication. The server would send the 
user the final challenge generated by G2 and the user would 
respond with H (G2 (,),00). Protocol BJ takes advantage 
of the fact that r = H (G2 (,),00) is already known. 


in protocol Bll. In this case the server would simply note 
that the first GOTCHA was confusing and generate a new 
GOTCHA. Once our user has created an account he can 
login by following protocol BZ. 

Claim [M says that a legitimate user can successfully au- 
thenticate if our Inkblot construction satisfies the usability 
requirements of a GOTCHA. The proof of claim [M can be 
found in appendix (Al. 


CLAIM 1. If (Gi, G2) is a (a, b, €, ô, 4)-GOTCHA then at 
least G-fraction of humans can successfully authenticate us- 
ing protocol ZA after creating an account using protocol B. 


One way to improve usability of our authentication pro- 
tocol is to increase the neighborhood of acceptably close 
matchings by increasing a. The disadvantage is that the 
running time for the server in protocol increases with 
the size of a. Claim 2 bounds the time needed to enumer- 
ate over all close permuations. The proof of claim Ø can be 
found in appendix [Al. 


CLAIM 2. For all permutations r : |k] > [k] anda > 0 
|{r' | dr (1,7) < a}| < 1+5. (i): . 
i=2 


For example, if the user matches k = 10 Inkblots and we 
want to accept matchings that are off by at most a = 5 en- 
tries then the server would need to enumerate over at most 
36,091 permutation. Organizations are already advised to 
use password hash functions like BCRYPT [B5] which inten- 
tionally designed to be slower than standard cryptographic 
hash functions — often by a factor of millions. Instead of 
making the hash function a million times slower to evaluate 
the server might instead make the hash function a thousand 
times slower to evaluate and use these extra computation 
cycles to enumerate over close permutations. The orga- 
nization’s trade-off is between: security, usability and the 
resources that it needs to invest during the authentication 
process. 

We observe that an adversary mounting an online attack 
would be naturally rate limited because he would need to 
solve a GOTCHA for each new guess. Protocol could 
also be supplemented with a k-strikes policy — in which a 
user is locked out for several hours after k incorrect login 
attempts — if desired. 


3.2 User Study 


To test our candidate GOTCHA construction we con- 
ducted an online user study”. We recruited participants 
through Amazon’s Mechanical Turk to participate in our 
study. The study was conducted in two phases. In phase 
1 we generated ten random Inkblot images for each partici- 
pant, and asked each participant to provide labels for their 
Inkblot images. Participants were advised to use creative 
titles (e.g., evil clown, frog, lady with poofy dress) because 
they would not need to remember the exact titles that they 


SA more precise calculation reveals that there are exactly 
13,264 permutations s.t. dio(z’,7) < 5 and a random 
permuation 7’ would only be accepted with probability 
3.66 x 107? 

°Our study protocol was approved for exemption by the In- 


stitutional Review Board (IRB) at Carnegie Mellon Univer- 
sity (IRB Protocol Number: HS13-219). 


Phase 1 | Phase 2 
Average 9.3 4.5 
StdDev 9.6 3 
Max 57.5 18.5 
Min 1.4 1.6 
Average < 20 6.2 N/A 


Table 1: Completion Times 


a 4 # parti ipanie He] Sn 
accurate} partici- 
pants 

[a=0 | 17 0.29 2.76 x 107 
|a=2 |22 0.38 127x10” 
[a=3 |26 0.45 7.88 x 107” 
[a=4 [|34 0.59 6.00 x 10-7 
[a=5 | 40 0.69 3.66 x 107° 


Table 2: Usability Results: Fraction of Participants who 
would have authenticated with accuracy parameter a 


used. Participants were paid $1 for completing this first 
phase. A total of 70 users completed phase 1. 

After our participants completed the first phase we waited 
ten days before asking our participants to return and com- 
plete phase 2. During phase 2 we showed each participant 
the Inkblot images they saw in phase 1 (in a random or- 
der) as well as the titles that they created during phase 1 
(in alphabetical order). Participants were asked to match 
the labels with the appropriate image. The purpose of the 
longer waiting time was to make sure that participants had 
time to forget their images and their labels. Participants 
were paid an additional $1 for completing phase 2 of the 
user study. At the beginning of the user study we let par- 
ticipants know that they would be paid during phase 2 even 
if their answers were not correct. We adopted this policy to 
discourage cheating (e.g., using screen captures from phase 
1 to match the images and the labels) and avoid positively 
biasing our results. 

We measured the time it took each participant to complete 
phase 1. Our results are summarized in table M. It is quite 
likely that some participants left their computer in the mid- 
dle of the study and returned later to complete the study 
(e.g., one user took 57.5 minutes to complete the study). 
While we could not measure time away from the computer, 
we believe that it is likely that at least 9 of our participants 
left the computer. Restricting our attention to the other 61 
participants who took at most 20 minutes we get an adjusted 
average completion time of 6.2 minutes. 

Fifty-eight of our participants returned to complete phase 
2 by taking our matching test. It took these participants 
4.5 minutes on average to complete the matching test. Sev- 
enteen of our participants correctly matched all ten of their 
labels, and 69% of participants matched at least 5 out of ten 
labels correctly. Our results are summarized in table Ø. 


Discussion. 

Our user study provides evidence that our construction is 
at least (0, 0.29)-usable or (5, 0.69)-usable. While this means 
that our Inkblot Matching GOTCHA could be used by a 
significant fraction of the population to protect their pass- 


words during authentication it also means that the use of 
our GOTCHA would have to be voluntary so that users who 
have difficulty won’t get locked out of their accounts. An- 
other approach would be to construct different GOTCHAs 
and allow users to choose which GOTCHA to use during 
authentication. 

Study Incentives: There is evidence that the lack of 
monetary incentives to perform well on our matching test 
may have negatively influenced the results (e.g., some par- 
ticipants may have rushed through phase 1 of the study be- 
cause their payment in round 2 was independent of their 
ability to match their labels correctly). For example, none 
of our 18 fastest participants during phase 1 matched all of 
their labels correctly, and — excluding participants we be- 
lieve left their computer during phase 1 (e.g., took longer 
than 20 minutes) — on average participants who failed to 
match at least five labels correctly took 2 minutes less time 
to complete phase 1 than participants who did. 

Time: We imagine that some web services may be re- 
luctant to adopt GOTCHAs out of fear driving away cus- 
tomers who don’t want to spend time labeling Inkblot im- 
ages [21]. However, we believe that for many high security 
applications (e.g., online banking) the extra security ben- 
efits of GOTCHAs will outweigh the costs — GOTCHAs 
might even help a bank keep its customers by providing 
extra assurance that users’ passwords are secure. We are 
looking at modifying our Inkblot generation algorithm to 
produce Inkblots which require less “mental effort” to label. 
In particular could techniques like Perlin Noise [B4] be used 
to generate Inkblots that can be labeled more quickly and 
matched more accurately? 

Accuracy: We believe that the usability of our Inkblot 
Matching GOTCHA construction can still be improved. One 
simple way to improve the usability of our GOTCHA con- 
struction would be to allow the user to reject Inkblot images 
that were confusing. We also believe that usability could be 
improved by providing users with specific strategies for cre- 
ating their labels (e.g., we found that simple labels like “a 
voodoo mask” were often mismatched, while more elaborate 
stories like “A happy guy on the ground, protecting himself 
from ticklers” were rarely mismatched). 


3.3 An Open Challenge to the AI Community 


We envision a rich interaction between the security com- 
munity and the artificial intelligence community. To facili- 
tate this interaction we present an open challenge to break 
our GOTCHA scheme. 


Challenge Setup. 


We chose several random passwords 
(pwi, ..., pwa) a {0,107} and pws È {0,108}. We used a 


function GenerateInkblots (pw;, 10) to generate ten inkblots 


Ii, ..., ig for each password, and we had a human label each 
inkblot image (Å, ..., ġo) < H ((Ii,..-, Tio), o0). We se- 
lected a random permutation 7; : [10] — [10] for each ac- 
count, and generated the tuple 


Ti am (si, hi (pws, ai miyan 7(10)) , Gy, ++» etio) , 


where s; is a randomly selected salt value and h is a cryp- 
tographic hash function. We are releasing the source code 
that we used to generate the Inkblots and evaluate the hash 


function h along with the tuples Ti, ..., T5 — see 


Challenge: Recover each password pw;. 


Approaches. 

One way to accomplish this goal would be to enumer- 
ate over every possible password guess pw; and evaluate 
h (pwi, si, 7(1),...,7(10)) for every possible permutation 7 : 
[10] — [10]. However, the goal of this challenge is to see 
if AI techniques can be applied to attack our GOTCHA 
construction. We intentionally selected our passwords from 
a smaller space to make the challenge more tractable for 
AI based attacks, but to discourage participants from try- 
ing to brute force over all password/permutation pairs we 
used BCRYPT (Level 15) — an expensive hash function 
— to encrypt the passwords. Our implementation allows 
the Inkblot images to be generated very quickly from a 
password guess pw’ so an AI program that can use the la- 
bels in the password file to distinguish between the correct 
Inkblots returned by GenerateInkblots (pw;,10) and in- 
correct Inkblots returned by GenerateInkblots (pw;, 10) 
would be able to quickly dismiss incorrect guesses. Similarly, 
an AI program which generates a small set of likely permu- 
tations for each password guess could allow an attacker to 
quickly dismiss incorrect guesses. 


4. ANALYSIS: COST OF OFFLINE ATTACKS 


In this section we argue that our password scheme (proto- 
cols and EJ) significantly mitigates the threat of offline 
attacks. An informal interpretation of our main technical 
result — Theorem [] — is that either (1) the adversary’s 
offline attack is prohibitively expensive (2) there is a good 
chance that adversary’s offline attack will fail, or (3) the 
underlying GOTCHA construction can be broken. Observe 
that the security guarantees are still meaningful even if the 
security parameters € and 6 are not negligably small. 


THEOREM 1. Suppose that our user selects his password 


uniformly at random from a set D (e.g., pw È D) and cre- 
ates his account using protocol E. If algorithms A and are 
an (€, ô, u)-GOTCHA then no conservative offline adversary 


is (Cc. te+do+ tH. D) -successful for C < yD cp, 4 
NHCH 


Proof of Theorem O. (Sketch) We use a hybrid argument. 
An adversary who breaches the server is able to recover the 
tuple t = (u, r’, s,h (u, s, pw, m(1),...,7(K)) , l0) +++ lœ) ) 
as well as the code for the cryptographic hash function h and 
the code for our GOTCHA — (Gi, G2). 


1. World 0: Wo denotes the real world in which the ad- 
versary has recovered the tuple 


to= (u, ae S, h (u, S, pw, m1), Bee ,7(k)) „lsti EaR Pte) 


as well as the code for the cryptographic hash function 
h and the code for our GOTCHA — (Gi, G2). Because 
the adversary Adv is conservative it constructs the 
function 


The level parameter specifies the computation complex- 
ity of hashing. The amount of work necessary to evaluate 
the BCRYPT hash function increases exponentially with the 


level so in our case the work increases by a factor of 27°. 


1 if pw’ = pw and nr’ =r 
VerifyHash (pw’ , r") = á 4 


and uses VerifyHash as a blackbox. We say that Adv 
queries a human H about password pw’ if it queries H 


for H (GenerateInkblotImages (1°, Extract (pw’,r’))), 


and we let D’ C D denote the set of passwords for 
which the adversary queries a human. 


2. World 1: W: denotes a hypothetical world that is simi- 
lar to Wo except that VerifyHash function the adver- 
sary uses as a blackbox is replaced with the following 
incorrect version 


VerifyHash’ (pw, T’) = 
1 if pw’ ¢ D', pw’ = pw and nr’ =r 


0 otherwise. ° 


where D’ C D is a subset of passwords which denotes 
the set of passwords for which the adversary makes 
queries to a human in the real world. 


3. World 2: W2 denotes a hypothetical world that is sim- 
ilar to Wi except that VerifyHash’ function the ad- 
versary uses as a blackbox is replaced with the follow- 
ing incorrect version 


VerifyHash” (pw’, T’) = 
1 ifm’ = R (Gə (1*, Extract (pw’,r’) ,41,...,£n)) 


and pw’ ¢ D',pw' = pw , 


0 otherwise. 


where R is a distribution with minimum entropy p(k) 
as in definition E. 


4. World 3: W3 denotes a hypothetical real world which is 
similar to world 2, except that the labels bea), sisala 
are replaced with the labels ei eats where 

: [k] + [k] is a new random permutation, and the 
labels L, are for a completely unrelated set of Inkblot 
challenges 


a Pee (G (tsal) i 
where x1, £2 € {0,1}” are freshly chosen random value. 


In world 3 it is easy to bound the adversary’s probability 
of success. No adversary is (C,y,D)-successful for C < 
y|D|2#* cp, because the fake Inkblot labels are not corre- 
lated with the actual Inblots that were generated with the 
real password. Our particular advesary cannot be (C, y, D)- 
successful for C < y|D|2#™® cn + |D’|cx. In world 2 the ad- 
versary might improve his chances of success by looking at 
the Inblot labels, but by definition of (a, 8, €, 6, 4)-GOTCHA 
his chances change by at most 6. In world 1 the adver- 
sary might further improve his chances of success, but by 
definition of (a, 8, €, 6, 4)-GOTCHA his chances improve by 
at most e. Finally, in world 0 the adversary improves his 
chances by at most |D'|/|D| by querying the human about 
passwords in D’. 


0 otherwise. ’ 


5. DISCUSSION 


We conclude by discussing some key directions for future 
work. 


Other GOTCHA Constructions. 

Because GOTCHAs allow for human feedback during puz- 
zle generation — unlike HOSPs [14] — our definition poten- 
tially opens up a much wider space of potential GOTCHA 
constructions. One idea might be to have a user rate/rank 
random items (e.g., movies, activities, foods). By allowing 
human feedback we could allow the user to dismiss poten- 
tially confusing items (e.g., movies he hasn’t seen, foods 
about which he has no strong opinion). There is some 
evidence that this approach could provide security (e.g., 
Narayanan and Shmatikov showed that a Netflix user can 
often be uniquely identified from a few movie ratings [B2].). 


Obfuscating CAPTCHAS. 

If it were possible to efficiently obfuscate programs then 
it would be easy to construct GOTCHAs from CAPTCHAs 
(e.g., just obfuscate a program that returns the CAPTCHA 
without the answer). Unfortunately, there is no general pro- 
gram obsfuscator 8]. However, the approach may not be en- 
tirely hopeless. Point functions [A6] can be obfuscated, and 
our application is similar to a point function — the puz- 
zle generator G2 in an GOTCHA only needs to generate a 
human solvable puzzle for one input. Recently, multilinear 
maps have been used to obfuscate conjunctions [3] and to 
obfuscate NC? circuits [23] ™. Could similar techniques be 
used obfuscate CAPTCHAs? 


Exploiting The Power of Interaction. 

Can interaction be exploited and used to improve secu- 
rity or usability in human-authentication? While interac- 
tion is an incredibly powerful tool in computer security (e.g., 
nonces [86], zero-knowledge proofs [24], secure multiparty 
computation [48]) and in complexity theory™, human au- 
thentication typically does not exploit interaction with the 
human (e.g., the user simply enters his password). We view 
the idea behind HOSPs and GOTCHAs — exploiting inter- 
action to mitigate the threat of offline attacks — as a pos- 
itive step in this direction. Could interaction be exploited 
to reduce memory burden on the user by allowing a user to 
reuse the same secret to authenticate to multiple different 
servers? The human-authentication protocol of Hopper, et 
al. [26] — based on the noisy parity problem — could be 
used by a human to repeatedly authenticate over an insecure 
channel. Unfortunately, the protocol is slow and tedious for 
a human to execute, and it can be broken if the adversary 
is able to ask adaptive parity queries B0]. 
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APPENDIX 
A. MISSING PROOFS 


Reminder of Claim M. If (Gi,G2) is a (a, B, €, 6, p)- 
GOTCHA then at least 8-fraction of humans can sucessfully 
authenticate using protocol LA after creating an account us- 
ing protocol BE. 

Proof of Claim fl. A legitimate user H € H will use the 
same passwords in protocols BI and BZ. Hence, 


ri = Extract (pw', r’) = Extract (pw, r’) = fis 


and the final matching challenge ĉ» is the same one that 
would be generated by G2 (Dig (Gi (1*, ri, 72) ,00))- If 
ĉr is consistently solvable with accuracy a by H — by def- 
inition Ø this is the case for at least (G-fraction of users — 
then it follows that 


dk (x, 1’, 04) <a, 


where H (Gi (1%, 71, 72)). For some mo (namely To = T) s.t. 
dp (To, T’) < a it must be the case that 


hpw,0 = h (u, s, pw’, To(1), ..., 7o(K)) 
= h(u,s,pw,7(1),...,7(k)) 
= hw, 


and protocol accepts. 


Reminder of Claim Claim Ø. For all permutations 
T : [k] > [k] anda > 0 


{r | de (x, 1’) <a}| < T2 (i): ; 


Proof ofA. It suffices to show that (5)! > Hr | de (a, 7’) = jf}. 
We first choose the j unique indices i1,...,i; on which m 
and x’ differ — there are (5) ways to do this. Once we have 
fixed our indices i,,...,i; we define x’ (k) = v (k) for each 
k ¢ {i1,... ij}. Now j! upperbounds the number of ways 
of selecting the remaining values 7’ (ix) s.t. m (in) Æ T (iz 
for all k < j. 


B. HOSP: PRE-GENERATED CAPTCHAS 


The HOSP construction proposed by [A] was to simply 
fill several high capacity hard drives with randomly gen- 
erated CAPTCHAs — discarding the solutions. Once we 
have compiled a database large D of CAPTCHAs we can 
use algorithm H as our challenge generator — simply return 
a random CAPTCHA from D. The advantage of this ap- 
proach is that we can make use of already tested CAPTCHA 
solutions so there is no need to make hardness assumptions 
about new AI problems. The primary disadvantage of this 
approach is that the size of the database D will be limited 
by economic considerations — storage isn’t free. While |D| 
the number of CAPTCHAs that could be stored on a hard 
drive may be large, it is not exponentially large. An adver- 
sary could theoretically pay humans to solve every puzzle in 
D at which point the scheme would be completely broken. 


Economic Cost. 
Suppose that two 4 TB hard drives are filled will text 
CAPTCHAS ™. Let S be the space required to store one 


13 At the time of submission a 4 TB hard drive can be pur- 
chased on Amazon for less than $162. 


Algorithm 4 GenerateChallenge 


Input: Random bits r € {0,1}", Database D = 
{P,, ..., Pan} of CAPTCHAs 
return P, 


CAPTCHA, and let Cy denote the cost of paying a human 
to solve a CAPTCHA. We use the values S = 8 KB™ and 
Cu = $0.001 F. In this case |D| = 1E © 10° so we can 
store a billion unsolved CAPTCHAs on the hard drives. It 
would cost the adversary |D|Cz = $1,000,000 to solve all 
of the CAPTCHAs — or $500,000 to solve half of them. 
The up front cost of this attack may be large, but once the 
adversary has solved the CAPTCHAs he can execute offline 
dictionary attacks against every user who had an account 
on the server. Many server breaches have resulted in the 
release of password records for millions of accounts [5, Ø, Ø, 
fi]. If each cracked password is worth between $4 and $30 
[22] then it may be easily worth the cost to pay humans to 
solve every CAPTCHA in D. 


“The exact value of S may vary slightly depending on the 
particular method used to generate the CAPTCHA. When 
we compressed a text CAPTCHA using popular GIF format 
the resulting files were consistently 8 KB. 

15Motoyama, et al. estimated that spammers paid humans 
$1 to solve a thousand CAPTCHAs [BI] 


